The complete chloroplast genome of Mussaenda pubescens and phylogenetic analysis

The chloroplast (cp) genome sequence of Mussaenda pubescens, a promising resource that is used as a traditional medicine and drink, is important for understanding the phylogenetic relationships among the Mussaenda family and genetic improvement and reservation. This research represented the first comprehensive description of the morphological characteristics of M. pubescens, as well as an analysis of the complete cp genome and phylogenetic relationship. The results indicated a close relationship between M. pubescens and M. hirsutula based on the morphological characteristics of the flower and leaves. The cp was sequenced using the Illumina NovaSeq 6000 platform. The results indicated the cp genome of M. pubescens spanned a total length of 155,122 bp, including a pair of inverted repeats (IRA and IRB) with a length of 25,871 bp for each region, as well as a large single-copy (LSC) region and a small single-copy (SSC) region with lengths of 85,370 bp and 18,010 bp, respectively. The results of phylogenetic analyses demonstrated that species within the same genus displayed a tendency to group closely together. It was suggested that Antirhea, Cinchona, Mitragyna, Neolamarckia, and Uncaria might have experienced an early divergence. Furthermore, M. hirsutula showed a close genetic connection to M. pubescens, with the two species having partially overlapping distributions in China. This study presents crucial findings regarding the identification, evolution, and phylogenetic research on Mussaenda plants, specifically targeting M. pubescens.


Plant material and morphological observations
All the materials were collected from the Third Botanical Resources Survey in Libo County (Guizhou, China, 26° 12′ N, 107° 30′ E) in September 2020.The species has been officially identified as 'M.pubescens' by a professor of plant taxonomy at Qiannan Normal University for Nationalities, which would be compared with the specimens of Chinese Mussaenda.The specimen (DZ20210505) was deposited in the herbarium of Qiannan Normal University for Nationalities in Duyun City, Guizhou Province, China.Morphological features of M. pubescens involved in branches, leaves, stipules, inflorescences, calyx lobes, corolla tubes, florescences, and fruits were observed to compare their differences (Fig. 1).

Complete chloroplast genome sequence of M. pubescens
The leaves of M. pubescens were collected on September 15, 2020 (Fig. 1) and stored immediately at − 80 °C.The isolation of total genomic DNA was performed using a modified CTAB method 22 .Library construction and sequencing were conducted by Novogene Bioinformatics Technology Co. Ltd. (Guangzhou, China) using the Illumina NovaSeq 6000 platform with a Paired-End 150 (PE150) strategy.The FastQC software was employed to trim low-quality reads and adapters, and the genome was de novo assembled using SPAdes v3.9 23 , and subsequently annotated using Plann software 22 .The accuracy of the preliminary annotation results was verified by comparing them with the proteins and rRNA sequences from previously reported cp genomes of relevant species using the methods of blastn and blastp.The complete chloroplast genome was submitted to the Database Resources of the National Genomics Data Center, China National Center for Bioinformatio (Genome Sequence

Phylogenetic analysis
A total of 28 complete chloroplast genome sequences spanning across 24 Rubiaceae species were acquired from the NCBI GenBank to ascertain the phylogenetic positions of M. pubescens within the Rubiaceae lineages.The Bayesian inference (BI) tree was constructed following standard protocols.The alignment of 28 cp genomic sequences was generated using the MAFFT online version 24 with default parameters.The software tools MAFFT, Ultrafast bootstrap, IQ-TREE, ModelFinder, and MrBayes were employed within the PhyloSuite framework 25 .FigTree was employed to illustrate the phylogenetic relationships (http:// tree.bio.ed.ac.uk/ softw are/ figtr ee/).

Ethics approval and consent to participate
Mussaenda pubescens is not endangered in China, and no specific permission was required for the collection.All Mussaenda pubescens materials in this study were collected in germplasm resource nursery of the College of Biological Science and Agriculture, Qiannan Normal University for Nationalities (QNUN) with the permission of the school.The current study complied with relevant institutional, national, and international guidelines and legislation.

Morphological characteristics of M. pubescens
The collected specimens of M. pubescens were compared with Chinese Mussaenda specimens for this research study (Table 1).The morphological features of M. pubescens, including branches, leaves, stipules, inflorescences, calyx lobes, corolla tubes, florescences, and fruits, were presented in Fig. 1.
Distribution and habitat: Currently, M. pubescens is known from the provinces of Zhejiang, Hainan, Yunnan, Guizhou, Sichuan, Guangxi, Guangdong, Hubei, Fujian, Hunan, Jiangxi, and Taiwan in China.This plant species is commonly found in dense thickets located in ravines, on slopes of hills, and along village boundaries or roadsides, typically thriving at elevations ranging from 100 to 900 m a.s.l.
Fruit period was Jun to Dec, were berry subglobose, 8-10 × 6-7.5 mm in diameter, smooth, fleshy or stiffly papery, sparsely pilose; stipitate 4-5 mm; black after drying.Similar species: The breeding system of M. pubescens is characterized by functional dioecy.A Key to Dioecious Species of Mussaenda in China is provided below.The forms of trichomes, as well as the morphology of the calyx and corolla, are important characteristics for identifying species 7 .M. pubescens is easily distinguished by its corolla tube 11-20 mm, with stems frequently having axillary short shoots containing small leaves 26,27 , flowers sessile to pedicellate and range in color from white to yellow, despite bearing morphological similarities to M. hirsutula 28 .Differences among these two species are provided in Table 1.
Etymology: The specific epithet is derived from the shape of the corolla.Paratypes: China, Guangxi:

Chloroplast genome features of M. pubescens
Chloroplast DNA sequences, characterized by their highly conserved structure, minimal recombination events, and predominantly uniparental inheritance, have traditionally served as preferred markers for reconstructing plant phylogeny.A robust phylogeny is essential to investigate the evolutionary patterns of traits across varying taxonomic levels.The chloroplast genome yields essential information for various scientific disciplines, including species identification, population genetics, phylogenetics, and genetic engineering research 14,15,29 .The filtering process obtained 20,928,581 paired-end reads from the Illumina NovaSeq platform, all meeting the required quality standards.The values for Q20 and Q30 were determined to be 97.53% and 92.98%, respectively.The complete cp genome sequence of M. pubescens was constructed de novo and subsequently submitted to the Database Resources of the National Genomics Data Center (GSA accession number: CRA013306).
The chloroplast genomes of plants consisted of photosynthetic genes, genes related to chloroplast transcriptional expression, and additional protein-coding genes 30 .Expansion and contraction of the boundaries of the inverted region (IR) are the primary factors contributing to changes in the size of the cp genomes, playing www.nature.com/scientificreports/ a crucial role in species evolution 31 .As was common with other angiosperms, the cp genome of M. pubescens consisted of a circular genome measuring 155,122 bp in length and possessing the typical quadripartite structure.This structure included a pair of reverse repeats, namely IRA and IRB, which spanned 25,871 bp, a small single copy region (SSC) measuring 18,010 bp, and a large single copy region (LSC) spanning 85,370 bp (Table 2 and Fig. 2).The expansion and contraction observed in the IR regions might serve as the primary mechanism for generating variations in the length of the cp genomes in both M. pubescens and its closely related species 9,32 .
The GC content varied across different regions of the cp genomes, with the IR regions particularly exhibiting high GC content, likely due to the presence of rRNAs 33 .Our findings indicated that the GC contents in the IR and SSC regions were measured to be 43.17% and 31.89%,respectively, with the average GC content of the whole genome being 37.67% (Table 3).Notably, the GC content of DNA in the IR regions was higher than that in other regions (LSC, SSC) (Fig. 2), which aligned with similar patterns observed in other flowering plants 34,35 .Furthermore, the GC skewness has been identified as a crucial indicator of DNA leading chains, lagging chains, and replication origins and terminals, which in turn serves as an important determinant of species affinity 36 .
Gene functions were subsequently assigned to all the genes (Table 2), with these genes being classified into four types: genes related to self-replication, genes related to photosynthesis, unknown function genes, and specific genes, including maturase (matK), protease (clpP), and others.Out of the 128 identified genes, there were 86 protein-coding genes, 34 transfer RNA (tRNA) genes, and 8 rRNA genes.The observed results were similar to those found in other Mussaenda species 37,38 .
In total, 62 protein-coding genes and 22 tRNA genes were situated in the LSC region, while 11 protein-coding genes and 1 tRNA gene were assigned to the SSC region of the cp genome (Fig. 2).Additionally, a total of 20 intron-containing genes, among which there were 17 genes (e.g., ndhA, ndhB, petB, petD, atpF, rpl16, rpl2, and rps16) with 1 intron and only 3 genes (rps12, clpP1, and trnH-GUG ) with 2 introns (Table 4).Among them, ndhA possessed the longest intron (1115 bp), while the shortest intron (16 bp) was observed in trnI GAU (Table 5).It was noteworthy that rps12 was classified as a trans-spliced gene consisting of two separated introns, with one exon positioned in the LSC region and the other two in the IR region (Fig. 2).Besides that, the rps12 gene of M. pubescens was comprised of an exon at the 5'-end located in the LSC region, while its 3'-end exons were positioned in the IR regions, as depicted in Fig. 2, which was consistent with that of the homologous species M. hirsutula 38 .
The codon usage bias in chloroplast genomes may arise from a combination of natural selection and genetic mutation, which is important to investigate this phenomenon as it provides insights into the evolutionary processes and functional constraints shaping the genetic code of chloroplasts 39 .The relative frequency of synonymous codons in the coding sequence of M. pubescens cp demonstrated that all genes were represented by 19,850 codons.The study identified the four most commonly utilized codons as ATT (Isoleucine), GAA (Glutamic acid), AAT (Asparagine), and AAA (Lysine), accounting for 843 (4.25%), 752 (3.79%), 723 (3.64%), and 715 (3.60%) codons, respectively.One of the most commonly used amino acids was leucine, with 2155 hits; another one, cysteine, had the lowest content, with only 277 hits.Additionally, codons ending with ' A' and 'T' accounted for 69.13% of all codons (Table 6), which aligned with previous studies on angiosperms 40 .The codon usage preferences encompassed within these features could actively contribute to a more in-depth understanding of exogenous gene expression and the mechanisms driving the evolution of the cp genome 41,42 .

Phylogenetic analysis
Chloroplast genomes are valuable sources of information for species identification and evolutionary analysis 19 .They serve as organelle-based "barcodes" to distinguish species and reveal interspecies phylogenetic relationships 43 .Furthermore, the progressive advancements in next-generation sequencing technology, specifically the implementation of second-generation technology, have facilitated the simplification of chloroplast genome sequencing.Thus, an accumulating body of studies has employed complete chloroplast genome sequences to examine phylogenetic relationships within angiosperms.The phylogenetic position of M. pubescens was analyzed by downloading 19 complete cp genomes from the GenBank database, all of which belonged to the Gramineae family.The genomes were aligned using the MAFFT 24 , and the phylogenetic tree was constructed with the Mega-X v10.0.5 software 44 employing the maximum likelihood method and 1000 bootstrap replicates (Fig. 3).The observed cp genome sequences played a vital role in elucidating and comprehending the phylogenetic relationships among Mussaenda species.The diversification of the species, M. pubescens, was attributed to the presence of a highly diverse set of genes, intermolecular recombination in the LSC region, and the co-occurrence of tandem repeats.The phylogenetic analyses yielded highly similar topologies across the complete cp genomes, LSC regions, and SSC regions (the complete cp genome displayed complete consistency with the LSC region), and every node in the phylogenetic trees exhibited high bootstrap support, with the exception of Antirhea chinensis and Cinchona officinalis (Fig. 3).Rubiaceae species, along with species in the same genus, showed a tendency to cluster together on a single large branch.The Rubiaceae branch was divided into two clades with Antirhea, Cinchona, Mitragyna, Neolamarckia, and Uncaria to other 20 genera.Mussaenda was found to be related to 19 other genera, and within these, it was determined that Mussaenda pubescens shared a close relationship with five other genera (Mussaenda hirsutula, Emmenopterys, Coffea, Diplospora, Lxora, and Trailliaedoxa).The phylogenetic tree demonstrated that Mussaenda  www.nature.com/scientificreports/ was polyphyletic and originated from the intercalation of branches from the genera M. hirsutula.Moreover, M. pubescens and M. hirsutula were found to be closely related genera.The findings support the identification and taxonomic study of M. pubescens, enabling resource collection, cultivation, study of pharmacological activities, and development of functional products.

Conclusion
M. pubescens was compared with the specimens of Chinese Mussaenda, indicating that M. pubescens was closely related to M. hirsutula.However, M. pubescens was distinguished by its gyro-shaped corolla tubes measuring 3-4 mm in length, ovate oblong or ovate lanceolate leaves, and oblong lanceolate corolla lobes.Furthermore, this study determined the complete cp genomes of M. pubescens for the first time and revealed their basic structures, conservation, and variability.The complete cp genomes, including their LSC and SSC regions, were employed to investigate the robust phylogenetic relationships within and between genera.Additionally, the availability of this information would offer useful references for subsequent studies on taxonomic identification, phylogenetics, population structure, and biodiversity within the genus Mussaenda.Furthermore, the comparative analysis of cp genomics in the genus Mussaenda contributed to our comprehension of cp genome dynamics, complexity, and evolution within the Rubiaceae family.In conclusion, this study served as a valuable reference for future research on species identification, evolutionary relationships, and the development of genetic resources within the Rubiaceae family.

Figure 2 .
Figure 2. The chloroplast genome maps of M. pubescens.Genes situated on the inner side of the circle are transcribed in a clockwise direction, whereas those on the outer side are transcribed in a counter-clockwise direction.The inner circle of a darker gray represents the GC content, while the lighter gray indicates the AT content.Different colors represent different functional genes.

Table 2 .
Characteristics of M. pubescens cp genome.

Table 3 .
The nucleotide composition of the complete chloroplast genomes of M. pubescens.

Table 5 .
Characteristics and sizes of the intron and exon genes from M. pubescens.

Table 6 .
Codon usage of M. pubescens cp genome from RSCU tools.